Adaptive Coherence Batching for Trap-Based Memory Architectures

نویسندگان

Håkan Zeffer

Erik Hagersten

چکیده

Both software-initiated and hardware-initiated prefetching have been used to accelerate shared-memory server performance. While software-initiated prefetching require instruction set and compiler support, hardware prefetching often require additional hardware structures or extra memory state. The coherence batching scheme proposed in this paper keeps the system completely binary transparent and does not rely on any additional hardware. Hence, it can be implemented without additional hardware in software coherent systems and improve performance for already optimized and compiled binaries. We have evaluated our proposals on a trap-based memory architecture where fine-grained coherence permission checks are done in hardware but the coherence protocol is run in software on the requesting processor. Functional fullsystem simulation shows that our software-only coherencebatch scheme is able to reduce the number of coherence misses with up to 60 percent compared to a system without coherence batching. The average miss reduction is 37 percent while the average bandwidth usage is reduced.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)

Nowadays, faults and failures are increasing especially in complex systems such as Network-on-Chip (NoC) based Systems-on-a-Chip due to the increasing susceptibility and decreasing feature sizes. On the other hand, fault-tolerant routing algorithms have an evident effect on tolerating permanent faults and improving the reliability of a Network-on-Chip based system. This paper presents reliabili...

متن کامل

Scalable directoryless shared memory coherence using execution migration

We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family of architectures. Migration-based architectures move threads among cores to guarantee sequential semantics in large multicores. Using a execution migration (EM) architecture, we achieve performance comparable to directory-based architectures without using directories: avoiding automatic data repl...

متن کامل

Programming Research Group PRACTICAL BARRIER SYNCHRONISATION

We investigate the performance of barrier synchronisation on both shared-memory and distributed-memory architectures, using a wide range of techniques. The performance results obtained show that distributed-memory architectures behave predictably, although their performance for barrier synchronisation is relatively poor. For shared-memory architectures, a much larger range of implementation tec...

متن کامل

Practical barrier synchronisation

We investigate the performance of barrier syn-chronisation on both shared-memory and distributed-memory architectures, using a wide range of techniques. The performance results obtained show that distributed-memory architectures behave predictably, although their performance for barrier synchronisation is relatively poor. For shared-memory architectures, a much larger range of implementation te...

متن کامل

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory A New Cache Protocol Based On The Order Free Consistency Memory Model

Computer architects are now studying a new generation of chip architectures that may integrate hundreds of processing cores and memory banks on a single chip with novel interconnect technologies. A key challenge lies in the design and development of an efficient on-chip shared memory organization for these future many-core architectures. New approaches need to be developed to address this chall...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Adaptive Coherence Batching for Trap-Based Memory Architectures

نویسندگان

چکیده

منابع مشابه

Reliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)

Scalable directoryless shared memory coherence using execution migration

Programming Research Group PRACTICAL BARRIER SYNCHRONISATION

Practical barrier synchronisation

University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory A New Cache Protocol Based On The Order Free Consistency Memory Model

عنوان ژورنال:

اشتراک گذاری